Goto

Collaborating Authors

 central node






Short-Range Oversquashing

Mishayev, Yaaqov, Sverdlov, Yonatan, Amir, Tal, Dym, Nadav

arXiv.org Artificial Intelligence

Message Passing Neural Networks (MPNNs) are widely used for learning on graphs, but their ability to process long-range information is limited by the phenomenon of oversquashing. This limitation has led some researchers to advocate Graph Transformers as a better alternative, whereas others suggest that it can be mitigated within the MPNN framework, using virtual nodes or other rewiring techniques. In this work, we demonstrate that oversquashing is not limited to long-range tasks, but can also arise in short-range problems. This observation allows us to disentangle two distinct mechanisms underlying oversquashing: (1) the bottleneck phenomenon, which can arise even in low-range settings, and (2) the vanishing gradient phenomenon, which is closely associated with long-range tasks. We further show that the short-range bottleneck effect is not captured by existing explanations for oversquashing, and that adding virtual nodes does not resolve it. In contrast, transformers do succeed in such tasks, positioning them as the more compelling solution to oversquashing, compared to specialized MPNNs.


A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order

Neural Information Processing Systems

Asynchronous parallel optimization received substantial successes and extensive attention recently. One of core theoretical questions is how much speedup (or benefit) the asynchronous parallelization can bring to us. This paper provides a comprehensive and generic analysis to study the speedup property for a broad range of asynchronous parallel stochastic algorithms from the zeroth order to the first order methods. Our result recovers or improves existing analysis on special cases, provides more insights for understanding the asynchronous parallel behaviors, and suggests a novel asynchronous parallel zeroth order method for the first time. Our experiments provide novel applications of the proposed asynchronous parallel zeroth order method on hyper parameter tuning and model blending problems.


Sample Efficient Active Learning of Causal Trees

Kristjan Greenewald, Dmitriy Katz, Karthikeyan Shanmugam, Sara Magliacane, Murat Kocaoglu, Enric Boix Adsera, Guy Bresler

Neural Information Processing Systems

Causal discovery from observational and interventional data is a fundamental problem and prevalent in multiple areas of science and engineering (Pearl, 2009; Spirtes et al., 2000; Peters et al., 2017). Learning the underlying causal mechanisms is essential for policy design.


Local Virtual Nodes for Alleviating Over-Squashing in Graph Neural Networks

Karabulut, Tuğrul Hasan, Baytaş, İnci M.

arXiv.org Artificial Intelligence

--Over-squashing is a challenge in training graph neural networks for tasks involving long-range dependencies. In such tasks, a GNN's receptive field should be large enough to enable communication between distant nodes. However, gathering information from a wide range of neighborhoods and squashing its content into fixed-size node representations makes message-passing vulnerable to bottlenecks. Graph rewiring and adding virtual nodes are commonly studied remedies that create additional pathways around bottlenecks to mitigate over-squashing. However, these techniques alter the input graph's global topology and disrupt the domain knowledge encoded in the original graph structure, both of which could be essential to specific tasks and domains. This study presents Local Virtual Nodes (L VN) with trainable embeddings to alleviate the effects of over-squashing without significantly corrupting the global structure of the input graph. The position of the L VNs is determined by the node centrality, which indicates the existence of potential bottlenecks. Thus, the proposed approach aims to improve the connectivity in the regions with likely bottlenecks. Furthermore, trainable L VN embeddings shared across selected central regions facilitate communication between distant nodes without adding more layers. Extensive experiments on benchmark datasets demonstrate that L VNs can enhance structural connectivity and significantly improve performance on graph and node classification tasks. The code can be found at https://github.com/ALLab-Boun/L RAPH Neural Networks (GNNs) have become a standard for representation learning in tasks involving non-Euclidean data. They can handle arbitrary graph topologies without any prior assumptions about their structure. Therefore, the GNNs are integral components in applications of social networks [1], traffic networks [2], and molecular graphs [3]. The flexibility of GNNs and their applicability to various domains mainly stem from the message-passing paradigm, an efficient operation that can handle diverse graph structures at a massive scale [4], [5].